Voice activity detection

ABSTRACT

Voice activity detector (VAD) for use in an LPC coder in a mobile radio system uses autocorrelation coefficient R 0 , R 1  . . . of the input signal, weighted and combined, to provide a measure M which depends on the power within that part of the spectrum containing no noise, which is thresholded against a variable threshold to provide a speech/no speech logic output. The measure is formula (I), where H i  are the autocorrelation coefficients of the impulse response of an Nth order FIR inverse noise filter derived from LPC analysis of previous non-speech signal frames. Threshold adaption and coefficient update are controlled by a second VAD response to rate of spectral change between frames.

This is a continuation of application Ser. No. 07/555,445, filed Aug.15, 1990, now abandoned.

BACKGROUND OF THE INVENTION

A voice activity detector is a device which is supplied with a signalwith the object of detecting periods of speech, or periods containingonly noise. Although the present invention is not limited thereto, oneapplication of particular interest for such detectors is in mobile radiotelephone systems where the knowledge as to the presence or otherwise ofspeech can be used and exploited by a speech coder to improve theefficient utilisation of radio spectrum, and where also the noise level(from a vehicle-mounted unit) is likely to be high.

The essence of voice activity detection is to locate a measure whichdiffers appreciably between speech and non-speech periods. In apparatuswhich includes a speech coder, a number of parameters are readilyavailable from one or other stage of the coder, and it is thereforedesirable to economise on processing needed by utilising some suchparameter. In many environments, the main noise sources occur in knowndefined areas of the frequency spectrum. For example, in a moving carmuch of the noise (e.g., engine noise) is concentrated in the lowfrequency regions of the spectrum. Where such knowledge of the spectralposition of noise is available, it is desirable to base the decision asto whether speech is present or absent upon measurements taken from thatportion of the spectrum which contains relatively little noise. Itwould, of course, be possible in practice to pre-filter the signalbefore analysing to detect speech activity, but where the voice activitydetector follows the output of a speech coder, prefiltering woulddistort the voice signal to be coded.

SUMMARY OF THE INVENTION

According to the invention there is provided a voice activity detectionapparatus comprising means for receiving an input signal, means forperiodically adaptively generating an estimate of the noise signalcomponent of the input signal, means for periodically forming a measureM of the spectral similarity between a portion of the input signal andthe noise signal component, means for comparing a parameter derived fromthe measure M with a threshold value T, and means for producing anoutput to indicate the presence or absence of speech in dependence uponwhether or not that value is exceeded.

Preferably, the measure is the Itakura-Saito Distortion Measure.

BRIEF DESCRIPTION OF THE DRAWINGS

Other aspects of the present invention are as defined in the claims.

Some embodiments of the invention will now be described, by way ofexample, with reference to the accompanying drawings, in which:

FIG. 1 is a block diagram of a first embodiment of the invention;

FIG. 2 shows a second embodiment of the invention;

FIG. 3 shows a third, preferred embodiment of the invention.

DETAILED DESCRIPTION OF THE DRAWINGS

The general principle underlying a first Voice Activity Detectoraccording to the a first embodiment of the invention is as follows.

A frame of n signal samples ##EQU1##

The zero order autocorrelation coefficient is the sum of each termsquared, which may be normalized i.e. divided by the total number ofterms (for constant frame lengths it is easier to omit the division);that of the filtered signal is thus ##EQU2## and this is therefore ameasure of the power of the notional filtered signal s'--in other words,of that part of the signal s which falls within the passband of thenotional filter.

Expanding, neglecting the first 4 terms, ##EQU3##

So R'₀ can be obtained from a combination of the autocorrelationcoefficients R_(i), weighted by the bracketed constants which determinethe frequency band to which the value of R'₀ is responsive. In fact, thebracketed terms are the autocorrelation coefficients of the impulseresponse of the notional filter, so that the expression above may besimplified to ##EQU4## where N is the filter order and H_(i) are the(un-normalised) autocorrelation coefficients of the impulse response ofthe filter.

In other words, the effect on the signal autocorrelation coefficients offiltering a signal may be simulated by producing a weighted sum of theautocorrelation coefficients of the (unfiltered) signal, using theimpulse response that the required filter would have had.

Thus, a relatively simple algorithm, involving a small number ofmultiplication operations, may simulate the effect of a digital filterrequiring typically a hundred times this number of multiplicationoperations.

This filtering operation may alternatively be viewed as a form ofspectrum comparison, with the signal spectrum being matched against areference spectrum (the inverse of the response of the notional filter).Since the notional filter in this application is selected so as toapproximate the inverse of the noise spectrum, this operation may beviewed as a spectral comparison between speech and noise spectra, andthe zeroth autocorrelation coefficient thus generated (i.e. the energyof the inverse filtered signal) as a measure of dissimilarity betweenthe spectra. The Itakura-Saito distortion measure is used in LPC toassess the match between the predictor filter and the input spectrum,and in one form is expressed as ##EQU5## where A₀ etc are theautocorrelation coefficients of the LPC parameter set. It will be seenthat this is closely similar to the relationship derived above, and whenit is remembered that the LPC coefficients are the taps of an FIR filterhaving the inverse spectral response of the input signal so that the LPCcoefficient set is the impulse response of the inverse LPC filter, itwill be apparent that the Itakura-Saito Distortion Measure is an factmerely a form of equation 1, wherein the filter response H is theinverse of the spectral shape of an all-pole model of the input signal.

In fact, it is also possible to transpose the spectra, using the LPCcoefficients of the test spectrum and the autocorrelation coefficientsof the reference spectrum, to obtain a different measure of spectralsimilarity.

The I-S Distortion measure is further discussed in "Speech Coding basedupon Vector Quantisation" by A Buzo, A H Gray, R M Gray and J D Markel,IEEE Trans on ASSP, Vol ASSP-28, No 5, October 1980.

Since the frames of signal have only a finite length, and a number ofterms (N, where N is the filter order) are neglected, the above resultis an approximation only; it gives, however, a surprisingly goodindicator of the presence or absence of speech and thus may be used as ameasure M in speech detection. In an environment where the noisespectrum is well known and stationary, it is quite possible to simplyemploy fixed h₀, h₁ etc coefficients to model the inverse noise filter.

However, apparatus which can adapt to different noise environments ismuch more widely useful.

Referring to FIG. 1, in a first embodiment, a signal from a microphone(not shown) is received at an input 1 and converted to digital samples sat a suitable sampling rate by an analogue to digital converter 2. AnLPC analysis unit 3 (in a known type of LPC coder) then derives, forsuccessive frames of n (e.g. 160) samples, a set of N (e.g. 8 or 12) LPCfilter coefficients L_(i) which are transmitted to represent the inputspeech. The speech signal s also enters a correlator unit 4 (normallypart of the LPC coder 3 since the autocorrelation vector R_(i) of thespeech is also usually produced as a step in the LPC analysis althoughit will be appreciated that a separate correlator could be provided).The correlator 4 produces the autocorrelation vector R_(i), includingthe zero order correlation coefficient R₀ and at least 2 furtherautocorrelation coefficients R₁, R₂, R₃. These are then supplied to amultiplier unit 5.

A second input 11 is connected to a second microphone located distantfrom the speaker so as to receive only background noise. The input fromthis microphone is converted to a digital input sample train by ADconverter 12 and LPC analysed by a second LPC analyser 13. The "noise"LPC coefficients produced from analyser 13 are passed to correlator unit14, and the autocorrelation vector thus produced is multiplied term byterm with the autocorrelation coefficients R_(i) of the input signalfrom the speech microphone in multiplier 5 and the weighted coefficientsthus produced are combined in adder 6 according to Equation 1, so as toapply a filter having the inverse shape of the noise spectrum from thenoise-only microphone (which in practice is the same as the shape of thenoise spectrum in the signal-plus-noise microphone) and thus filter outmost of the noise. The resulting measure M is thresholded by thresholder7 to produce a logic output 8 indicating the presence or absence ofspeech; if M is high, speech is deemed to be present.

This embodiment does, however, require two microphones and two LPCanalysers, which adds to the expense and complexity of the equipmentnecessary.

Alternatively, another embodiment uses a corresponding measure formedusing the autocorrelations from the noise microphone 11 and the LPCcoefficients from the main microphone 1, so that an extra autocorrelatorrather than an LPC analyser is necessary.

These embodiments are therefore able to operate within differentenvironments having noise at different frequencies, or within a changingnoise spectrum in a given environment.

Referring to FIG. 2, in the preferred embodiment of the invention, thereis provided a buffer 15 which stores a set of LPC coefficients (or theautocorrelation vector of the set) derived from the microphone input 1in a period identified as being a "non speech" (i.e. noise only) period.These coefficients are then used to derive a measure using equation 1,which also of course corresponds to the Itakura-Saito DistortionMeasure, except that a single stored frame of LPC coefficientscorresponding to an approximation of the inverse noise spectrum is used,rather than the present frame of LPC coefficients.

The LPC coefficient vector L_(i) output by analyser 3 is also routed toa correlator 14, which produces the autocorrelation vector of the LPCcoefficient vector. The buffer memory 15 is controlled by thespeech/non-speech output of thresholder 7, in such a way that during"speech" frames the buffer retains the "noise" autocorrelationcoefficients, but during "noise" frames a new set of LPC coefficientsmay be used to update the buffer, for example by a multiple switch 16,via which outputs of the correlator 14, carrying each autocorrelationcoefficient, are connected to the buffer 15. It will be appreciated thatcorrelator 14 could be positioned after buffer 15. Further, thespeech/no-speech decision for coefficient update need not be from output8, but could be (and preferably is) otherwise derived.

Since frequent periods without speech occur, the LPC coefficients storedin the buffer are updated from time to time, so that the apparatus isthus capable of tracking changes in the noise spectrum. It will beappreciated that such updating of the buffer may be necessary onlyoccasionally, or may occur only once at the start of operation of thedetector, if (as is often the case) the noise spectrum is relativelystationary over time, but in a mobile radio environment frequentupdating is preferred.

In a modification of this embodiment, the system initially employsequation 1 with coefficient terms corresponding to a simple fixed highpass filter, and then subsequently starts to adapt by switching over tousing "noise period" LPC coefficients. If, for some reason, speechdetection fails, the system may return to using the simple high passfilter.

It is possible to normalise the above measure by dividing through by R₀,so that the expression to be thresholded has the form ##EQU6## Thismeasure is independent of the total signal energy in a frame and is thuscompensated for gross signal level changes, but gives rather less markedcontrast between "noise" and "speech" levels and is hence preferably notemployed in high-noise environments.

Instead of employing LPC analysis to derive the inverse filtercoefficients of the noise signal (from either the noise microphone ornoise only periods, as in the various embodiments described above), itis possible to model the inverse noise spectrum using an adaptive filterof known type; as the noise spectrum changes only slowly (as discussedbelow) a relatively slow coefficient adaption rate common for suchfilters is acceptable. In one embodiment, which corresponds to FIG. 1,LPC analysis unit 13 is simply replaced by an adaptive filter (forexample a transversal FIR or lattice filter), connected so as to whitenthe noise input by modelling the inverse filter, and its coefficientsare supplied as before to autocorrelator 14.

In a second embodiment, corresponding to that of FIG. 2, LPC analysismeans 3 is replaced by such an adaptive filter, and buffer means 15 isomitted, but switch 16 operates to prevent the adaptive filter fromadapting its coefficients during speech periods.

A second Voice Activity Detector for use with another embodiment of theinvention will now be described.

From the foregoing, it will be apparent that the LPC coefficient vectoris simply the impulse response of an FIR filter which has a responseapproximating the inverse spectral shape of the input signal. When theItakura-Saito Distortion Measure between adjacent frames is formed, thisis in fact equal to the power of the signal, as filtered by the LPCfilter of the previous frame. So if spectra of adjacent frames differlittle, a correspondingly small amount of the spectral power of a framewill escape filtering and the measure will be low. Correspondingly, alarge interframe spectral difference produces a high Itakura-SaitoDistortion Measure, so that the measure reflects the spectral similarityof adjacent frames. In a speech coder, it is desirable to minimise thedata rate, so frame length is made as long as possible; in other words,if the frame length is long enough, then a speech signal should show asignificant spectral change from frame to frame (if it does not, thecoding is redundant). Noise, on the other hand, has a slowly varyingspectral shape from frame to frame, and so in a period where speech isabsent from the signal then the Itakura-Saito Distortion Measure willcorrespondingly be low--since applying the inverse LPC filter from theprevious frame "filters out" most of the noise power.

Typically, the Itakura-Saito Distortion Measure between adjacent framesof a noisy signal containing intermittent speech is higher duringperiods of speech than periods of noise; the degree of variation (asillustrated by the standard deviation) is also higher, and lessintermittently variable.

It is noted that the standard deviation of the standard deviation of Mis also a reliable measure; the effect of taking each standard deviationis essentially to smooth the measure.

In this second form of Voice Activity Detector, the measured parameterused to decide whether speech is present is preferably the standarddeviation of the Itakura-Saito Distortion Measure, but other measures ofvariance and other spectral distortion measures (based for example onFFT analysis) could be employed.

It is found advantageous to employ an adaptive threshold in voiceactivity detection. Such thresholds must not be adjusted during speechperiods or the speech signal will be thresholded out. It is accordinglynecessary to control the threshold adapter using a speech/non-speechcontrol signal, and it is preferable that this control signal should beindependent of the output of the threshold adapter. The threshold T isadaptively adjusted so as to keep the threshold level just above thelevel of the measure M when noise only is present. Since the measurewill in general vary randomly when noise is present, the threshold isvaried by determining an average level over a number of blocks, andsetting the threshold at a level proportional to this average. In anoisy environment this is not usually sufficient, however, and so anassessment of the degree of variation of the parameter over severalblocks is also taken into account.

The threshold value T is therefore preferably calculated according to

    T=M'+K·d

where M' is the average value of the measure over a number ofconsecutive frames, d is the standard deviation of the measure overthose frames, and K is a constant (which may typically be 2).

In practice, it is preferred not to resume adaptation immediately afterspeech is indicated to be absent, but to wait to ensure the fall isstable (to avoid rapid repeated switching between the adapting andnon-adapting states).

Referring to FIG. 3, in a preferred embodiment of the inventionincorporating the above aspects, an input 1 receives a signal which issampled and digitised by analogue to digital converter (ADC) 2, andsupplied to the input of an inverse filter analyser 3, which in practiceis part of a speech coder with which the voice activity detector is towork, and which generates coefficients L_(i) (typically 8) of a filtercorresponding to the inverse of the input signal spectrum. The digitisedsignal is also supplied to an autocorrelator 4, (which is part ofanalyser 3) which generates the autocorrelation vector R_(i) of theinput signal (or at least as many low order terms as there are LPCcoefficients). Operation of these parts of the apparatus is as describedin FIGS. 1 and 2. Preferably, the autocorrelation coefficients R_(i) arethen averaged over several successive speech frames (typically 5-20 mslong) to improve their reliability. This may be achieved by storing eachset of autocorrelations coefficients output by autocorrelator 4 in abuffer 4a, and employing an averager 4b to produce a weighted sum of thecurrent autocorrelation coefficients R_(i) and those from previousframes stored in and supplied from buffer 4a. The averagedautocorrelation coefficients Ra_(i) thus derived are supplied toweighting and adding means 5,6 which receives also the autocorrelationvector A_(i) of stored noise-period inverse filter coefficients L_(i)from an autocorrelator 14 via buffer 15, and forms from Ra_(i) and A_(i)a measure M preferably defined as: ##EQU7##

This measure is then thresholded by thesholder 7 against a thresholdlevel, and the logical result provides an indication of the presence orabsence of speech at output 8.

In order that the inverse filter coefficients L_(i) correspond to a fairestimate of the inverse of the noise spectrum, it is desirable to updatethese coefficients during periods of noise (and, of course, not toupdate during periods of speech). It is, however, preferable that thespeech/non-speech decision on which the updating is based does notdepend upon the result of the updating, or else a single wronglyidentified frame of signal may result in the voice activity detectorsubsequently going "out of lock" and wrongly identifying followingframes. Preferably, therefore, there is provided a control signalgenerating circuit 20, effectively a separate voice activity detector,which forms an independent control signal indicating the presence orabsence of speech to control inverse filter analyser 3 (or buffer 15) sothat the inverse filter autocorrelation coefficients A_(i) used to formthe measure M are only updated during "noise" periods. The controlsignal generator circuit 20 includes LPC analyser 21 (which again may bepart of a speech coder and, specifically, may be performed by analyser3), which produces a set of LPC coefficients M_(i) corresponding to theinput signal and an autocorrelator 21a (which may be performed byautocorrelator 3a) which derives the autocorrelation coefficients B.sub.i of M_(i). If analyser 21 is performed by analyser 3, then M_(i) =L_(i)and B_(i) =A_(i). These autocorrelation coefficients are then suppliedto weighting and adding means 22, 23 (equivalent to 5, 6) which receivealso the autocorrelation vector R_(i) of the input signal fromautocorrelator 4. A measure of the spectral similarity between the inputspeech frame and the preceding speech frame is thus calculated; this maybe the Itakura-Saito distortion measure between R_(i) of the presentframe and B_(i) of the preceding frame, as disclosed above, or it mayinstead be derived by calculating the Itakura-Saito distortion measurefor R_(i) and B_(i) of the present frame, and subtracting (in subtractor25) the corresponding measure for the previous frame stored in buffer24, to generate a spectral difference signal (in either case, themeasure is preferably energy-normalised by dividing by R_(o)). Thebuffer 24 is then, of course, updated. This spectral difference signal,when thresholded by a thresholder 26 is, as discussed above, anindicator of the presence or absence of speech. We have found, however,that although this measure is excellent for distinguishing noise fromunvoiced speech (a task which prior art systems are generally incapableof) it is in general rather less able to distinguish noise from voicedspeech. Accordingly, there is preferably further provided within circuit20 a voiced speech detection circuit comprising a pitch analyser 27(which in practice may operate as part of a speech coder, and inparticular may measure the long term predictor lag value produced in amultipulse LPC coder). The pitch analyser 27 produces a logic signalwhich is "true" when voiced speech is detected, and this signal,together with the threshold measure derived from thresholder 26 (whichwill generally be "true" when unvoiced speech is present) are suppliedto the inputs of a NOR gate 28 to generate a signal which is "false"when speech is present and "true" when noise is present. This signal issupplied to buffer 15 (or to inverse filter analyser 3) so that inversefilter coefficients L_(i) are only updated during noise periods.

Threshold adapter 29 is also connected to receive the non-speech signalcontrol output of control signal generator circuit 20. The output of thethreshold adapter 29 is supplied to thresholder 7. The threshold adapteroperates to increment or decrement the threshold in steps which are aproportion of the instant threshold value, until the thresholdapproximates the noise power level (which may conveniently be derivedfrom, for example, weighting and adding circuits 22, 23). When the inputsignal is very low, it may be desirable that the threshold isautomatically set to a fixed, low, level since at the low signal levelsthe effect of signal quantisation produced by ADC 2 can produceunreliable results.

There may be further provided "hangover" generating means 30, whichoperates to measure the duration of indications of speech afterthresholder 7 and, when the presence of speech has been indicated for aperiod in excess of a predetermined time constant, the output is heldhigh for a short "hangover" period. In this way, clipping of the middleof low-level speech bursts is avoided, and appropriate selection of thetime constant prevents triggering of the hangover generator 30 by shortspikes of noise which are falsely indicated as speech. It will of coursebe appreciated that all the above functions may be executed by a singlesuitably programmed digital processing means such as a Digital SignalProcessing (DSP) chip, as part of an LPC codec thus implemented (this isthe preferred implementation), or as a suitably programmed microcomputeror microcontroller chip with an associated memory device.

Conveniently, as described above, the voice detection apparatus may beimplemented as part of an LPC codec. Alternatively, whereautocorrelation coefficients of the signal or related measures (partialcorrelation, or "parcor", coefficients) are transmitted to a distantstation the voice detection may take place distantly from the codec.

I claim:
 1. Voice activity detection apparatus comprising:(i) means forreceiving an electrical input signal in which the presence or absence ofsignals representing speech is to be detected; (ii) means responsive tosaid means for receiving for periodically adaptively generating anelectrical signal representing an estimated noise signal component ofthe input signal by producing the autocorrelation coefficients A_(i) ofthe impulse response of a FIR filter having a response approximating theinverse of the short term spectrum of the noise signal component; (iii)means responsive to said means for receiving for periodically formingfrom the input signal and the estimated noise representing signal anelectrical signal representing a measure M of the spectral similaritybetween a portion of the input signal and the said estimated noisesignal component, said measure forming means comprises means forproducing electrical signals representing the autocorrelationcoefficients R_(i) of the input signal, and means connected to receiveR_(i) and A_(i) signals, and to calculate the measure M therefrom; and(iv) electrical means responsive to said means for forming for comparingthe electrical signals representing said measure with a threshold valuerepresenting signal to produce an electrical output indicating thepresence or absence of speech in the electrical input signal. 2.Apparatus according to claim 1, further comprising an input arranged toreceive a second electrical input signal, similarly subject to noise,from which speech is absent, in which the generating means comprise LPCanalysis means for deriving values of A_(i) from the second inputsignal.
 3. Apparatus according to claim 1 in which the generating meansincludes an adaptive filter for generating said coefficients. 4.Apparatus according to claim 2 in which the means for producing thesignals representing the autocorrelation coefficients of the inputsignal are arranged to do so in dependence upon the autocorrelationcoefficients of several successive portions of the signal.
 5. Apparatusaccording to claim 1 or 4, in which

    M=R.sub.O A.sub.O +2ΣR.sub.i A.sub.i.


6. Apparatus according to claim 1 or 4, in which ##EQU8##
 7. Apparatusaccording to claims 1 or 4, in which said generating means comprises abuffer connected to store data from which the autocorrelationcoefficients A_(i) of the said filter response may be obtained, in whichthe said filter response is periodically calculated from the signal byLPC analysis means, the apparatus being so connected and controlled thatthe measure M is calculated using the said stored data, and the saidstored data is updated only from periods in which speech is indicated tobe absent.
 8. Apparatus according to claim 7 further comprising secondvoice activity detection means responsive to said input signal forindicating the absence of speech to control the updating of the storeddata.
 9. Apparatus according to claims 1 or 4, further comprising meansfor adjusting said threshold value during periods when speech isindicated to be absent.
 10. Apparatus according to claim 9 furthercomprising second voice activity detection means responsive to saidinput signal to produce a control signal indicating the presence orabsence of speech, said adjusting means being responsive to said controlsignal to prevent adjustment of said threshold value when speech ispresent.
 11. Apparatus according to claim 9 in which said thresholdvalue is, when adjusted, adjusted to be equal to the mean of the measureplus a term which is a fraction of the standard deviation of themeasure.
 12. Apparatus according to claim 10 further comprising meansfor adjusting the said threshold value during periods when speech isindicated to be absent, said second voice activity detection meansserving also to prevent adjustment of the threshold value when speech ispresent.
 13. Apparatus according to claim 10 in which said second voiceactivity detection means comprises means for generating a measure of thespectral similarity between a portion of the input signal and earlierportions of the input signal.
 14. Apparatus according to claim 13 inwhich the similarity measure generating means of said second voiceactivity detection means comprises means for providing, from LPC filterdata and autocorrelation data relating to a present portion of the inputsignal, a present distortion measure; means for providing an equivalentpast frame distortion measure corresponding to a preceding portion ofthe input signal, and means for generating a signal indicating thedegree of similarity therebetween as an indicator of speech presence orabsence.
 15. Apparatus according to claim 13, in which said second voiceactivity detection means further comprises voiced speech detection meanscomprising pitch analysis means, for generating a signal indicative ofthe presence of voiced speech, upon which the output of said secondvoice activity detection means also depends.
 16. Voice activityapparatus comprising:(i) means for receiving an electrical signal inwhich the presence or absence or signals representing speech is to bedetected; (ii) means responsive to said means for receiving forperiodically adaptively generating an electrical signal representing anestimated noise signal component of the input signal, said generatingmeans including analysis means operable to produce electrical signalsrepresentative of the coefficients of a filter having a spectralresponse which is the inverse of the frequency spectrum of the estimatednoise signal component; (iii) means responsive to said means forperiodically adaptively generating for periodically forming from theinput signal and the estimated noise representing signal and electricalsignal representing a measure of a spectral similarity between a portionof the input signal and the said estimated noise signal component, themeasure being proportional to a zero-order autocorrelation of the inputsignal after filtering by a filter having the said coefficients; and(iv) electrical means for comparing the measure with a threshold valueto produce an output indicating the presence or absence of speech.
 17. Amethod of detecting voice activity representing signals in an electricalinput signal, comprising(a) periodically adaptively generating anelectrical signal representing an estimated noise signal component ofthe input signal, and producing signals representing the coefficients ofa filter having a spectral response which is the inverse of thefrequency spectrum of the estimated noise signal component; (b)periodically forming from the input signal and the estimated noiserepresenting signal an electrical signal representing a measure of thespectral similarity between a portion of the input signal and the saidestimated noise signal component, the measure being proportional to azero-order autocorrelation of the input signal after filtering by afilter having the said coefficients; and (c) electrically comparing themeasure with a threshold valve to produce an output indicating thepresence or absence of speech.
 18. Voice activity detection apparatuscomprising:(i) means for receiving an electrical input signal in whichthe presence or absence of signals representing speech is to bedetected; (ii) analysis means responsive to said means for receivingoperable to produce electrical signals representing the coefficients ofa filter having a spectral response which is the inverse of thefrequency spectrum of the input signal; (iii) means for periodicallyadaptively generating an electrical signal representing an estimatednoise signal component of the input signal; (iv) electrical meansresponsive to said analysis means and said estimated noise generatingmeans for periodically forming from the filter coefficients and theestimated noise representing signal further signals representing ameasure of a spectral similarity between a portion of the input signaland the same estimated noise signal component, the measure beingproportional to a zero-order autocorrelation of the noise representingsignal after filtering by a filter having the same coefficients; and (v)means for comparing the measure with a threshold value to produce anoutput indicating the presence or absence of speech.
 19. A method ofdetecting voice activity representing signals in an electrical inputsignal, comprising:(a) producing electrical signals representing thecoefficients of a filter having a spectral response which is the inverseof the frequency spectrum of the input signal; (b) periodicallyadaptively generating electrical signals representing an estimated noisesignal component of the input signal; (c) periodically forming from thefilter coefficients and the estimated noise representing signal anelectrical signal representative of a measure of the spectral similaritybetween a portion of the input signal and the said estimated noisesignal component, the measure being proportional to the zero-orderautocorrelation of the noise representing signal after filtering by afilter having the said coefficients; and (d) comparing the measure witha threshold value to produce an output indicating the presence orabsence of speech.
 20. A voice activity detection apparatuscomprising:(i) a first voice activity detector which operates by formingelectrical signals representing a measure of a spectral similaritybetween an electrical input signal and a speech free stored portion ofan input signal to produce an electrical output signal indicating thepresence or absence of speech in the input signal; (ii) a store forcontaining the stored portion of the input signal; and (iii) anauxiliary voice activity detector responsive to said electrical inputsignal to produce a second signal indicating the presence or absence ofspeech in the input signal, said second signal alone controlling theupdating of said store, the auxiliary voice activity detector operatingby forming an electrical signal representing a measure of a spectralsimilarity between a current input signal and an earlier portion of theinput signal.
 21. A voice activity detection apparatus comprising:(i)means for receiving an electrical input signal in which the presence orabsence of signals representing speech is to be detected; (ii) a storefor storing an estimated noise representation signal; (iii) meansresponsive to said means for receiving for periodically forming from theinput signal and the stored estimated noise representation signal anelectrical signal representing a measurement of the spectral similaritybetween a portion of the input signal and the said estimated noisesignal component; (iv) electrical means for comparing the measure with athreshold value to produce an output indicating the presence or absenceof speech; (v) an auxiliary voice activity detector, operating byforming an electrical signal representing a measure of spectralsimilarlity between the input signal and a preceding portion of theinput signal to produce a control signal indicating the presence orabsence of speech; and (vi) store updating means operable to update thestore from said electrical input signal only when said control signalindicates that speech is absent.
 22. Apparatus according to claim 21,further comprising means for adjusting the said threshold value duringperiods when speech is indicated by said control signal to be absent.23. Apparatus according to claim 21 or 22, in which said auxiliary voiceactivity detector further comprises voiced speech detection meanscomprising pitch analysis means for generating a signal indicative ofthe presence of voiced speech, upon which the control signal produced bysaid auxiliary voice activity detector also depends.